-
Notifications
You must be signed in to change notification settings - Fork 248
fix: delete cni statefile when unable to be parsed #3551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -192,13 +193,19 @@ func (nm *networkManager) restore(isRehydrationRequired bool) error { | |||
// Read any persisted state. | |||
err := nm.store.Read(storeKey, nm) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to log the contents invalid and all to help assist with troubleshooting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into this and the store's file name and contents are not accessible-- currently leaning towards just adding a method to the interface to dump the contents of the store in string format.
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
e6da159
to
a6a3b34
Compare
adds Dump to store interface which returns the raw contents of the store in string format if the store is empty or not readable it returns an error adds to a ut to check that dump effectively prints the contents of the store
a6a3b34
to
c94573b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a failure scenario where the statefile cannot be parsed due to potential corruption (e.g., null bytes), by deleting the corrupted file to allow recovery. Key changes include:
- Adding a Dump() method to the KeyValueStore interface and its implementations.
- Updating unit tests to validate Dump() behavior for both empty and populated stores.
- Adjusting the restore logic to delete the statefile when a JSON syntax error is encountered.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
testutils/store_mock.go | Added Dump() method to the mock store for testing. |
store/store.go | Extended the KeyValueStore interface with Dump(). |
store/mockstore.go | Implemented Dump() method for the mock store. |
store/json_test.go | Added tests for verifying Dump() behavior. |
store/json.go | Added Dump() and readBytes() methods for file handling. |
network/manager.go | Updated restore logic to delete corrupted state files. |
Comments suppressed due to low confidence (1)
network/manager.go:213
- Consider handling the error returned by nm.store.Remove() to ensure that removal of the corrupted state file is successful and to improve error reporting.
nm.store.Remove()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue where the CNI statefile becomes corrupted (e.g., containing null bytes) causing parsing failures by deleting the irrecoverable statefile automatically. Key changes include:
- Adding a Dump() method to the KeyValueStore interface and its implementations (jsonFileStore, mockStore, storeMock).
- Updating unit tests to check the Dump() behavior on both empty and populated statefiles.
- Modifying the restore logic in network manager to delete the statefile when encountering JSON syntax errors.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
testutils/store_mock.go | Added Dump() stub for the KeyValueStoreMock. |
store/store.go | Updated KeyValueStore interface with Dump() method. |
store/mockstore.go | Implemented Dump() for mockStore returning formatted data. |
store/json_test.go | Updated tests to verify Dump() behavior for empty and populated stores. |
store/json.go | Refactored file reading logic; included Dump() implementation. |
network/manager.go | Enhanced state restoration by deleting corrupted statefiles. |
Comments suppressed due to low confidence (1)
store/json_test.go:169
- [nitpick] Consider unmarshaling both the dumped JSON and the expected JSON and comparing their data structures instead of modifying the string, which would improve test robustness and maintainability.
val = strings.ReplaceAll(val, " ","")
_, err = kvs.Dump() | ||
if err == nil { | ||
t.Fatal("Expected store to not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expect the store to exist right ? why the nil
check and a fatal
error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store struct does exist, but the actual json representation on disk doesn't exist yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the store struct is created, but it's not yet created on disk
if readErr != nil { | ||
logger.Error("Could not read corrupted state", zap.Error(readErr)) | ||
} else { | ||
logger.Info("Logging state", zap.String("stateFile", contents)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: logging corrupted state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We log the corrupted state for debugging as per the suggestion by Michael above. Is there some drawback of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for debugging as per the above reviewer's comments-- what other issues could arise from logging the corrupted state? It should only be dumped once as the statefile will be deleted afterwards.
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
Reason for Change:
Sometimes in certain scenarios (usually windows), if there is a crash of the OS, null bytes may be written to the state and log file. When the CNI tries to restore the state, it is unable to read the statefile and fails. All subsequent retries will fail as the state is irrecoverable. This PR changes this behavior to delete the entire cni statefile if there is a syntax error (ex: if there are a bunch of null bytes in the file), as manual intervention would be needed to recover anyway. The null statefile issue only seems to appear on the pipelines on windows nodes.
Issue Fixed:
See above
Requirements:
Notes:
This issue appears sporadically